lecture one
on artificial intelligence and face recognition
1
xiaowan : sh-iao one
x.yi@arts.ac.uk https://vimeo.com/user65401583 xiaowan-yi.com
2
(hopefully) Get you interested in AI
Clarify some notions around AI Rethink about numbers
Have an awesome face detector app on your
iphone made by yourself
3
“a boring lecture on Thursday afternoon and everyone is late”
generated by https:// huggingface.co/spaces/ stabilityai/stable-diffusion
4
AI is fundamentally interesting…
Why?
5
How I get interested in AI
Cozy environment
Curious about ourselves (mind-blowing moments)
It’s NOT hard (easy to follow because it closely connects to our daily experience)
6
what is this sound?
https://www.youtube.com/watch?v=bbLDfueL7eU
7
bringing the vid…
how do you feel?
8
Once upon a time…
9
10
Pattern recognition: find regularity, enable
prediction making
11
travel back to our era…
what do we have now?
12
The significance of pattern recognition to humanity
“
We distinguished predator from prey; and poisonous plants from nourishing ones - enhancing our chance to live and reproduce, and passing on our genes. We used pattern recognition
in astronomy and astrology, where different cultures, recognizing
the patterns of stars in the skies, projected different symbols and pictures for constellations. We used it to predict the passing of the seasons, including how every culture determined that the passage of a comet was taken as an omen.”
— “When Knowledge Conquered Fear”, third episode of the documentary tv series Cosmos: A Spacetime Odyssey
13
Pattern recognition as an essential part of our
experience:
Guess the bonfire sound, Get on the right tube line(daily task solving), read my handwriting(language), appreciate music(art), etc.
Name more…
14
Artificial Intelligence
what is it??
“intelligence, made by human
” it is still an unfinished goal
we usually hate artificial something
15
intelligence{
16
Intelligence is a big bag word
It includes the ability to solve complex problems or make decisions with outcomes benefiting the actor
and many more…
17
Is intelligence exclusive to human?
18
We have:
Gorilla uses a stick to test depth of water
https://en.wikipedia.org/wiki/Tool_use_by_animals#:~:text=Chimpanzees%20are%20sophisticated%20tool%20users,in%20the%20Republic%20of%20Congo.
Fish makes art:
https://www.youtube.com/watch?v=VQr8xDk_UaY
Dog talks:
https://www.youtube.com/watch?v=QKQK7EIcq9Y
19
What is intelligence?
Intelligence is always specific, most of time we are interested in human intelligence (for simplicity, we omit human in the rest of slides)
What are we able to do with intelligence? Human task solving, art making , etc. what else?
20
If we manage to build the machine to be able to
solve tasks and make art, can we say we have achieve AI?
Intelligence = task solving + art making?
emotion
21
Thinking about intelligence is a way of appreciation
Simple things are hard
“Be the subject of your own thoughts”
22
}
23
artificial intelligence{
24
Here are some human-made concepts of subjects:
Engineering science:
make tools
— life gets easier
Natural science:
discover and explain phenomena
— curiosity satisfied
25
These two are actually intertwined
So does AI!
26
AI is a tool (like engineering science)
how to use tool (CoreML
)
how to make tool (CreateML
)
27
AI is also our attempt to understand
intelligence better (natural science)
“ We don’t think we have understood something unless we can build it fRoM sCrACh.”
28
From scratch: use machine which is created by
human, not other organic beings we don’t fully understand yet
Then…
Where shall I start with if I want to make machines to be intelligent, aka to be able to solve tasks and make art?
29
Recall:
Pattern recognition as an essential part of human experience:
Get on the right tube line(daily task solving), read my handwriting(language), appreciate music(art), etc.
30
Let’s make the machine to do pattern
recognition!
31
Let’s make the machine to do pattern recognition!
Do you know how cool it is?
Few hours of computations by our little metal box
~=
Hundreds of thousands of years of human evolution and experience accumulation
Shout out to Patrick Winston https://youtu.be/Unzc731iCUY?t=2323
32
Here are what we’ve talked so far:
Pattern recognition is everywhere and amazing
We want to understand intelligence by prototyping it using human-made machine (“AI”)
One intermediate goal of prototyping intelligence is to make machine to do pattern recognition(because we guess that pattern recognition is an essential part of intelligence)
33
How does this image-generating AI
demonstrate its pattern recognition ability?
“a boring lecture on Thursday afternoon and everyone is late”
generated by https:// huggingface.co/spaces/ stabilityai/stable-diffusion
34
}
35
Insert self introduction here :)
Research https://anonymous84654.github.io/RAVE_anonymous/
Drum and AI https://vimeo.com/93213203
36
Story time
What are AI researchers like? “sleeping”
Geoffry Hinton, the godfather of DL, is recently taking inspiration from what the purpose of sleeping is and why we have dreams…
https://www.youtube.com/ watch?v=2EDP4v-9TUA
37
“a lucid dream”
generated by https:// huggingface.co/ spaces/stabilityai/ stable-diffusion
38
Studying AI is a thought-provoking process
And it will get us to know ourselves better (lots of fun facts to come…)
Enjoy !
39
Noodling time..
40
Intelligence is not exclusive to human
Other species can also make and use tool, solve tasks and create art…
And now that machine can do something too What make us us then?
41
“humanity ~= human capability - artificial intelligence"
42
Is it true that we assume machine intelligence is always a subset
of human intelligence?
Is it possible if machine can actually do things that we intrinsically can not do? Like some of machine intelligence capabilities are beyond that of human intelligence ?
43
Representation{
44
What is representation?
“descriptor” “features” “characteristics”
45
Why do we need to have the notion of representation:
It is inevitable as a result of our “flaw”, more on this later
It is also a powerful tool towards task solving
46
What is apple?
47
Pattern recognition question 1
What is apple as a fruit vs. Apple as tech company?
48
(Efficient) representations:
edible ?
Upper case?
etc.
49
Pattern recognition question 2
What is apple vs. pear?
50
(Efficient) representations:
shape ?
its taste ?
51
Good representation simplifies our task
To excel at pattern recognition ~= To find a good representation
ANIME time ! DOMAIN EXPANSION
https://www.youtube.com/watch?v=nmvkhLz8t7I
52
(Efficient) representations:
shape ?
its taste ?
Meet “papple ” …
https://www.theguardian.com/lifeandstyle/wordofmouth/
2012/may/21/the-papple-tasted-and-tested
53
To get out of ambiguity, just ask about the
context
Why do you want to know if it is an apple or pear?
54
Representation is contextual
Depends on the problem given, different tasks have different efficient representations
55
Perhaps we can never describe/represent
one thing as it is with nothing less nothing more…
our natural language doom to fail (our “flaw”)
56
Representation
descriptive, captures some characteristics
contextual, task-dependent
perspective, always partial
57
Another related notion
Abstraction:
Taking away irrelevant details, reducing the representation to essential characteristics
58
Joel’s slides on “what is computational
thinking” https://jgl.github.io/DiplomaInAppleDevelopment-AutumnWinter2022/codingOne/lecture_01.html#42
Lots of things are connected. Studying AI is a brilliant manifestation of computational thinking.
59
Numbers{
60
What is the domain where we can have almost perfect
representation (aka without ambiguity)?
61
I have three pens
what is “I” what is “have” what is “pen”
what is “three”
62
Numbers
Count
Measure
..?
63
00764
Numbers
Count
Measure
Label
We always need a “protocol”(like an agreement on how to interpret numbers, or like a dictionary for looking up number’s meaning) when using numbers in real life.
65
Though numbers provide an almost perfect representation domain, it is “fictional”
In real life, we don’t see numbers on their own walking on the street
66
When we encounter numbers in real world, there are always real-world meanings attached to numbers
We always need an interpretation guide(“protocol”) when using numbers in real life.
67
Why do I want to talk about numbers?
Our human-made poor machine can only deal with numbers
Numbers can introduce maths, which is our DOMAIN EXPANSION
It is SUPER important to grasp the idea of using (numbers
+ protocol) to represent things, for doing fancy AI stuff
68
}
End of noodling,
Starting ordinary lecture mode…
69
promise
by the end of this unit, you will:
know how artificial intelligence works in practice
make a wide range of ios apps including face detection, speech recognition, activity classification, etc…
have more thoughts on artificial Intelligence as a cultural concept
70
scope of this module
describe how machine learning works in practice (Knowledge)
construct applications with the Core ML framework (Process)
discuss artificial Intelligence as a cultural concept (Enquiry)
71
for each lecture, we will go describe
-> construct
-> discuss
72
lecture plan today
machine learning model introduction
40 mins
face detection introduction
40 mins
write your awesome face detection ios app
1 hr
discussion
and breaks in-between…
73
AI? ML?
interchangeable throughout this unit
74
Machine “learning"
Mechanical Turk
https://en.wikipedia.org/wiki/Mechanical_Turk
“Learning”: we train machines to solve tasks, machines are not quite autonomous
75
AI is a general term (cultural impact, etc. )
ML is a technical term (algorithm, code framework, etc.)
Deep Learning…? Neural Networks…?
DL(NN) ⊂ ML ⊂ AI
“interchangeable”
76
before diving into how machine learns…
77
how do we learn??
questions
how you can reach out to the world and gather knowledge
intuitions
how you can internalise the knowledge efficiently
78
how do we learn? - questions
whatever questions you have in mind — interrupt me anytime !
having a question means you have already known enough to know what is not known[1]
i’m also learning from your questions
79
how do we learn? - intuitions
intuitions connect you from academic jargons to daily real life
some intuitions on ML can be gained just by introspection
i’m also here to help by sharing mine
a lot of AI developments are largely inspired by what we think of how ourselves are being put together
“attention mechanism”
80
81
machine learning model
82
before diving into machine learning model…
83
“information era”
84
information we receive from the world are mainly from four categories:
image (video)
text (language)
sound (music, speech)
numbers(the weather in degree celsius , your birthday, etc. )
can you think of any information that is not from the four categories? there are…
85
Terminology used by AI nerds data category = data modality
86
my mind-blowing moment:
information from any of these three categories (image, text and sound) can be represented by just a bunch of numbers
using numbers only
87
image in numbers:
✴two numbers for its width and height (how many pixels)
✴for each pixel, what the rgb values are
88
language in numbers:
✴we will talk about this later
✴but for now just think about when you looking up a word in a dictionary
using page number and index
✴(also math itself is a language….)
89
sound in numbers:
this is a wav file of a drum beat, screenshot with a lot of zooming in
each dot represents a number
90
why do we care about represent things in numbers?
because computers can only deal with numbers
because with numbers we can do our DOMAIN EXPANSION aka math
91
machine learning model
? ?
92
what is a model
93
some common ML models:
face detection model
dog-or-cat image classification model
speech recognition model
language translation model
What do they have in common?
they are all nice tools but every one sounds very different from each other!
94
Though sounding different, they share the same structure of what a
model is (recall a “class” or a “protocol” in swift )
each ML model takes in input, does some process and generates
“doing some process” is usually where the elegant maths, computations and perhaps confusion happen
95
“what are the input and output of this model” is always the first
question to ask when you try to understand a model
it also helps answering this question: what does this ML model do
96
what are the possible input and output?
- data, aka information
ML models take in information, do some process, and generate (hopefully useful) information
remember those four categories of information?
our candidates of input and output are:
image
, text
, sound
, numbers
97
“what are the input and output of this model” is always the first
question to ask..
It also helps answering this question: what does this ML model do
98
try this…
what does a speech recognition model
do?
try answer using “given a <your educated guess on the input>, the speech recognition model generates <your educated guess on the output>”
image
, text
, audio
, numbers
99
and try this…
what does a dog-or-cat image classification model
do?
try answer using “given a <your guess on the input>, the dog-or-cat image classification model generates <your guess on the output>”
image
, text
, sound
, numbers
100
How to shepherd the meaning of a bunch of numbers in the output?
How do we know what each output number represent? During training (next unit),
specify which output number means what (the “protocol”), this protocol will stay consistent across the life span,
the protocol of how to interpret each output number should be passed to model users
101
An example
Task context:
Use numbers to represent whether the image is a dog or cat The number representation I come up with:
[0, 1]
The protocol I’m going to pass around:
hello this is a protocol created by covfefe for the dog-or-cat numeric representation and I can bullshit whatever I want here as long as I explain how to interpret [0, 1] somewhere are you with me
The first number (with index 0) in this array represents the probability of this image being a dog image
The second number (with index 1) in the array represents the probability of this image being a cat image
102
now try this
103
output
input
“a boring lecture where
everyone is sleeping”
generated by https:// huggingface.co/spaces/ stabilityai/stable-diffusion
104
think of ML models as tools
how to make tools
we have not talked about how to make ML models, aka “doing the process” part
we will be looking at how to make ML models at a later time
how to use tools (our focus of this unit)
we need to know each tool’s specifications
what are the specifications of ML models?
- input and output
105
coincidentally, apple ML framework has a similar division too
how to make tools - CreateML how to use tools - CoreML
106
also coincidentally, the “input and output” thinking of an ML model manifests in how apple defines a ML model in its CoreML framework:
107
can you find this “input, process and output” mechanism in us as human beings?
https://www.youtube.com/watch?v=X5fD0Evny4w&t=36s
108
end of machine learning model introduction
question?
109
face detection model
110
while your memory is fresh..
what does a face detection model do?
try answer using “given a <your guess on the input>, the face detection model generates <your guess on the output>”
image
, text
, sound
, numbers
111
face detection model
given an image (with or without faces, could be any),
112
the face detection model generates
the detected locations of faces
what can we use the model output
(detected face locations) for?
113
what can we use the model output (detected face locations) for?
counting how many faces are there
draw the detected face location bounding box on the image
applying an emoji to cover the face
we will be building an app to achieve all of these in a minute!!!
114
go to app construction now…
or if time allows we can dive a bit deeper into the face detection model introduction
115
a face detection model does not generate output in the form of this nice green rectangular bounding box as you see
its output is actually a numeric representation of this green box (recall we can represent all those amazing stuff in numbers?)
based on the number representation of the bounding box, we programme the computer to help us draw out this box
116
how is the detected face location, aka bounding box, represented in number?
117
an easier question
how is the location of a single point in an image represented in numbers?
the location of a point is denoted by two numbers, aka coordinate (x, y)
118
x represents the
distance (in number of pixels) between the point and the left most edge
y represents the distance (in number of pixels) between the point to the upper most edge
119
now that we know how a single point is represented in number
a bounding box is nothing but a combination of its four corner points
once the four corners is known, we just sit and let the computer to draw the lines for us
120
Example on how to represent one bounding box in numbers:
Location of upper-left corner: [0, 0]
Location of upper-right corner: [20, 0]
Location of upper-left corner: [0, 40]
Location of upper-left corner: [20, 40]
One bounding box: [[0, 0], [20, 0], [0, 40], [20, 40]]
Don’t forget the protocol:
<insert your educated guess here>
121
Noodling time:
Given the representation of one bounding box: [[0, 0], [20, 0], [0, 40], [20, 40]]
(with <protocol same as in last slide>)
Can we infer the width and height of this box?
122
Noodling time:
do we really need all four corners’ (x, y) coordinates to be able to draw the bounding box?
123
some face detection model can do more than just figuring
out where the outline of face is…
recall when you see someone’s face image from a book and you tend to look into their eyes for one second?
124
some face detection models can do something similar…
they can find the exact locations of eyes and nose-tips, and many more…
these points are called “landmarks”
check what landmarks apple’s model can find https:// developer.apple.com/documentation/vision/vnfacelandmarks2d
125
here is a face detection landmarks output of manga images [2]
126
each landmark is represented by its coordinates
a set of landmarks means a set of coordinates
127
Example on how to numerically represent landmarks numbers:
Location of right eye mid point : [30, 20] Location of left eye mid point: [50, 20] Location of nose tip point : [40, 40] Location of upper-left corner: [20, 40] and other points of interest…
Landmarks: [[30, 20], [50, 20], [40, 40] … etc. ] Don’t forget the protocol:
Arrays are in the order of right eye mid point, left eye mid point, nose tip point, etc.
128
what do we need the landmarks for?
the bounding box can only tell if some region has a face, regardless of its rotation
landmarks can tell us the rotation of the face
we need this information to perfectly overlay emoji
129
finding faces may seem trivial for our visual system, it used to be a hard task for machines
we can locate face and landmarks in one go within
a blink, for machines finding landmarks is another level of difficulty to achieve
“simple things are hard”
130
thankfully when coding an iOS app, we just need to type in the right function that’s all
detecting bounding boxes: VNDetectFaceRectanglesRequest()
detecting landmarks: VNDetectFaceLandmarksRequest()
131
By calling VNDetectFaceRectanglesRequest() or VNDetectFaceLandmarksRequest()
We are retrieving the output of apple’s awesome face detection model
Question for later:
Where do we feed input to the model?
132
end of face detection model introduction
question?
133
construction time
!!!
https://github.com/XiaowanYi/MLOne-DiplomaInAppleDevAW22-Lec-01
134
preparation 1: which Xcode version are you using?
135
preparation 2:
there will be mostly cutting and pasting from the textbook
don’t be scared — you don’t have to comprehend every single line
136
preparation 3:
to get the cutting and pasting right, pay attention to which function or which object you are pasting into
“the scope”
137
let’s start the project by open Xcode:
Create a new Xcode project iOS -> App -> Next
138
step a:
select SwiftUI from interface dropdown menu
Use core data and include tests unchecked (not important for this project)
Click next and select a folder:
Good practice: creating folder in a designated working folder
139
step b:
magic dust 1:
Info -> custom iOS target properties
-> “+” on any row -> select “privacy
- camera usage description”
140
step c:
let’s look at textbook P171- 173 step 3 & 4
don’t worry about errors notifications, they will be resolved as we progress
have you seen the familiar VNDetectFaceRectanglesRequest() ?
141
step d:
let’s move to textbook P174 step 1, 2 & 3 note we are moving to a new file (Views.swift) this is defining ui buttons we will use later
142
step e:
let’s look at textbook P176-177 step 4, 5 &6 correction 1:
in step 4 first line struct Main View
It should be struct MainView (remove the space in-between) correction 2:
in step 4 second line private let image: Ullmage
It should be private let image: UIImage (both are capital I not lowercase)
143
step f:
let’s look at textbook P177-181 step 7 this is a new and long struct, be careful
144
step g:
let’s look at textbook P181-182 step 8 this is for handling rotations
recall: when do we need rotations?
145
stop h:
moving to filo ContontViow.swift filo
146
stop i:
lot’s look at toxtbook P182-184 stop 9, 10, 11 ui stuff
In stop 11 tho lino with:
.navigationBarTitlo(Toxt(“FDDomo"),
you can chango tho toxt string to bo your own app namo
147
stop j:
lot’s look at toxtbook P184-185 stop 12, 13
148
stop k:
lot’s look at toxtbook P185-187 stop 14, 15, 16
noto from stop 14 wo go out of tho scopo of oxtonsion ContontViow{} and pasting codos diroctly
149
stop L:
lot’s look at toxtbook P187-188 stop 17, 18, 19
noto from stop 14 wo go out of tho scopo of oxtonsion ContontViow{} and pasting codos diroctly
150
stop m:
patch 1:
continuo on stop 19 , add tho following function into struct ContontViow: Viow {}
privato func controlRoturnod(imago: UIImago?) {
print("Imago roturn \(imago == nil ? "failuro" : "succoss")...") solf.imago = imago?.fixOriontation()
solf.facos = nil
}
151
stop n:
magic dust 2:
add placoholdor and icon to your Assots (drag and drop to Assots in navigator)
152
building timo !!!
153
noxt: draw tho bounding box
154
stop o:
lot’s look at toxtbook P190-192 stop 1, 2, 3 wo aro moving to filo Facos.swift
horo tho codos corrosponding to draw out tho bounding box
you can customiso tho box colour in stop 3 contoxt.sotStrokoColor(UIColor.rod.cgColor)
155
stop p:
lot’s look at toxtbook P192 stop 5
wo aro updating tho ontiro gotFacos() to incorporato tho box drawing function (drawOn() )
156
building timo !!!
157
noxt: applying omoji on top
wo’ll only bo working on tho filo Facos.swift
158
stop q:
lot’s look at toxtbook P197-199 stop 1 for imago rotation
159
stop r:
lot’s look at toxtbook P199-203 stop2 rocall: landmarks
VNFacoLandmarks2D horo roprosonts all of tho landmarks that Applo’s Vision framowork can dotoct in a faco.
160
stop s:
lot’s look at toxtbook P203-207 stop3, 4, 5, 6 adding oxtonsions on tho global scopo
161
stop t:
socond last stop!
lot’s look at toxtbook P207-210 stop7
it is roplacing tho ontiro oxtonsion on Colloction
basically it roplacos tho box drawing function with tho omoji placing function
162
if no omoji is shown your oditor, you nood to copy
pasto tho list of omojis from lino149 - 157 horo
https://github.com/AIwithSwift/ PracticalAIwithSwift1stEd-Codo/blob/mastor/ Chaptor%204%20-%20Vision/Faco%20Dotoction/ FDDomo-Improvod/FDDomo/Facos.swift
163
stop u:
finally!!!
in Facos.swift -> oxtonsion UIImago {} -> roughly 4th lino chango lot roquost = VNDotoctFacoRoctanglosRoquost() to
lot roquost = VNDotoctFacoLandmarksRoquost()
(in ordor to switch from bounding box dotoction to landmarks dotoction)
164
building timo !!!
165
congrats
don’t bo scarod about tho codo,
this locturo is about undorstanding tho practical sido of ML
as long as you got tho idoa of “using ML modol output by calling tho right function“
166
a gontlo summary
numoric roprosontation 1 — imago, toxt and sound can bo roprosontod using numbors with protocol
numoric roprosontation 2 — a point location within an imago is roprosontod using coordinatos(x, y)
numoric roprosontation 3: — a bounding box within an imago can bo roprosontod using coordinatos(x, y) of its four cornor points
input and output charactoriso a ML modol
applo’s faco dotoction modol can output dotoctod faco bounding boxos through function VNDotoctFacoRoctanglosRoquost()
it can also output landmarks through anothor function VNDotoctFacoLandmarksRoquost()
167
“a scroonshot of an ios
faco dotoctor app”
gonoratod by https:// huggingfaco.co/spacos/ stabilityai/stablo-diffusion
168
Roforoncos
rof 1: https://www.scioncodiroct.com/scionco/articlo/abs/pii/S0022537179902007
rof 2: https://www.somanticscholar.org/papor/Facial-Landmark-Dotoction-for-Manga-
Imagos-Strickor-Augoroau/64cac22210861d4o9afb00b781da90cf99f9d19c imago rof https://animovyuh.org/faco-dotoction-using-oponcv/
imago rof https://support.wolfram.com/25330?src=mathomatica
169